IRCAM Corpus Tools: Managing speech corpora
نویسندگان
چکیده
Corpus based methods are increasingly used for speech technology applications and for the development of theoretical or computer models of spoken languages. These usages range from unit selection speech synthesis to statistical modeling of speech phenomena like prosody or expressivity. In all cases, these usages require a wide range of tools for corpus creation, labeling, symbolic and acoustic analysis, storage and query. However, if a variety of tools exists for each of these individual tasks, they are rarely integrated into a single platform made available to a large community of researchers. In this paper, we propose IrcamCorpusTools, an open and easily extensible platform for analysis, query and visualization of speech corpora. It is already used for unit selection speech synthesis, for prosody and expressivity studies, and to exploit various corpora of spoken French or other languages. MOTS-CLÉS : parole, corpus, base de données, langage de requête, multimodalité.
منابع مشابه
Praaline: Integrating Tools for Speech Corpus Research
This paper presents Praaline, an open-source software system for managing, annotating, analysing and visualising speech corpora. Researchers working with speech corpora are often faced with multiple tools and formats, and they need to work with ever-increasing amounts of data in a collaborative way. Praaline integrates and extends existing time-proven tools for spoken corpora analysis (Praat, S...
متن کاملFor Standardised Amazigh Linguistic Resources
Amazigh language and culture may well be viewed to have known an unprecedented booming in Morocco : more than a hundredwhich are published by the Royal Institute of Amazigh Culture (IRCAM), an institution created in 2001 to preserve, promote and endorse Amazigh culture in all its dimensions. Crucially, publications in the Amazigh language would not have seen light without the valiant attempts t...
متن کاملQuerying Annotated Speech Corpora
This paper is concerned with querying annotated speech corpora. A growing number of such corpora is currently being created worldwide; however, their usefulness for a wider research community is restricted by the lack of standard tools for creating, editing, annotating, storing and querying them. Two solutions for these problems are presented here: the XML-based data format TASX for corpus crea...
متن کاملFestvox: Tools for Creation and Analyses of Large Speech Corpora
This paper summarises the tools provided within Festvox[1], a freely available software suite for creation and analyses of large scale speech corpora for enabling research, development and instruction in speech technologies.
متن کاملConcordancing for parallel spoken language corpora
Concordancing is one of the oldest corpus analysis tools, especially for written corpora. In NLP concordancing appears in training of speech-recognition system. Additionally, comparative studies of different languages result in parallel corpora. Concordancing for these corpora in a NLP context is a new approach. We propose to combine these fields of interest for a multi-purpose concordance for ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- TAL
دوره 49 شماره
صفحات -
تاریخ انتشار 2008